Perf/export fixes #1

borisfom · 2023-02-18T22:35:00Z

What does this PR do ?

Add a one line overview of what this PR aims to accomplish.

Collection: [Note which collection this PR will affect]

Changelog

Add specific line by line info of high level changes in this PR.

Usage

You can potentially add a usage example below

# Add a code snippet demonstrating how to use this

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you add or update any necessary documentation?
Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
- Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

New Feature
Bugfix
Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.

Additional Information

Related to # (issue)

* per-micro-batch input loader * per-micro-batch input loader set arg default val * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * minor fix * apply per-microbatch-loader to only GPT * update docstring on micro-batch input loader * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixed the default arg val * fix batch size to 1 at log stat registration * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update container for CI Signed-off-by: ericharper <[email protected]> * update container in jenkinsfile Signed-off-by: ericharper <[email protected]> * update container for CI Signed-off-by: ericharper <[email protected]> fix merge conflict * revert Jenkinsfile * Revert "revert Jenkinsfile" This reverts commit d23b775. * Update nemo/collections/nlp/models/language_modeling/megatron_gpt_model.py Signed-off-by: Tim Moon <[email protected]> * add GradScaler * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: ericharper <[email protected]> Signed-off-by: Tim Moon <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: ericharper <[email protected]> Co-authored-by: Tim Moon <[email protected]>

Signed-off-by: fayejf <[email protected]>

* Partial impl of ALSD alignment extraction Signed-off-by: smajumdar <[email protected]> * Partial impl of ALSD alignment extraction Signed-off-by: smajumdar <[email protected]> * Remove everything else Signed-off-by: smajumdar <[email protected]> * Support dataclass in AbstractRNNTDecoding Signed-off-by: smajumdar <[email protected]> * Add first draft unittest Signed-off-by: smajumdar <[email protected]> * Correct the logic to more to the next timestep in the alignment Signed-off-by: smajumdar <[email protected]> * Finalize ALSD alignment generation Signed-off-by: smajumdar <[email protected]> * Add support for TSD greedy alignment extraction Signed-off-by: smajumdar <[email protected]> * Add support for mAES greedy alignment extraction Signed-off-by: smajumdar <[email protected]> * Finalize extraction of alignments from all beam algorithms for RNNT Signed-off-by: smajumdar <[email protected]> * Style fixes Signed-off-by: smajumdar <[email protected]> * Add copyright Signed-off-by: smajumdar <[email protected]> * Address comments Signed-off-by: smajumdar <[email protected]> --------- Signed-off-by: smajumdar <[email protected]>

* Base code for AWS SageMaker example Signed-off-by: SeanNaren <[email protected]> * Remove format Signed-off-by: SeanNaren <[email protected]> * wrap Signed-off-by: SeanNaren <[email protected]> * Add a notebook with the code Signed-off-by: SeanNaren <[email protected]> * Setup Signed-off-by: SeanNaren <[email protected]> * Update notebook Signed-off-by: SeanNaren <[email protected]> * Remove space Signed-off-by: SeanNaren <[email protected]> * Fix spelling mistake Signed-off-by: SeanNaren <[email protected]> * Add message to explain usage Signed-off-by: SeanNaren <[email protected]> * Add CommonVoice esperanto example Signed-off-by: SeanNaren <[email protected]> * Fix path Signed-off-by: SeanNaren <[email protected]> * Fixes Signed-off-by: SeanNaren <[email protected]> * Import sox locally, add documentation Signed-off-by: SeanNaren <[email protected]> * Address reviews Signed-off-by: SeanNaren <[email protected]> * Address reviews Signed-off-by: SeanNaren <[email protected]> * Address reviews Signed-off-by: SeanNaren <[email protected]> * Add cell to download the SSL model Signed-off-by: SeanNaren <[email protected]> * Set max epochs to 300 Signed-off-by: SeanNaren <[email protected]> * Fixes, introduce HF dataset instructions Signed-off-by: SeanNaren <[email protected]> * Upstream updates from other branch Signed-off-by: SeanNaren <[email protected]> * Fix warning Signed-off-by: SeanNaren <[email protected]> * Add README, add image Signed-off-by: SeanNaren <[email protected]> * Fix warning Signed-off-by: SeanNaren <[email protected]> * Address feedback Signed-off-by: SeanNaren <[email protected]> * Feedback Signed-off-by: SeanNaren <[email protected]> --------- Signed-off-by: SeanNaren <[email protected]>

* Add papers from 2022/2022 to PUBLICATIONS.md Signed-off-by: smajumdar <[email protected]> * Remove ipynb from being tracked as for nemo code library Signed-off-by: smajumdar <[email protected]> * Remove ipynb from being tracked as for nemo code library Signed-off-by: smajumdar <[email protected]> * Add additional papers Signed-off-by: smajumdar <[email protected]> --------- Signed-off-by: smajumdar <[email protected]>

Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]>

…it tests (#5980) (#5984) Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]>

…ict-0.7b_nv22.10.txt (#5869) * removed WHATEVER(1) ˌhwʌˈtɛvɚ Signed-off-by: MikyasDesta <[email protected]> * remove WHATEVER(1) and WHATEVER's(1) Signed-off-by: MikyasDesta <[email protected]> * removed nv22.10.txt Signed-off-by: MikyasDesta <[email protected]> * added updated and removed words to notes Signed-off-by: MikyasDesta <[email protected]> * sign off Signed-off-by: MikyasDesta <[email protected]> --------- Signed-off-by: MikyasDesta <[email protected]> Co-authored-by: Mikyas Desta <[email protected]>

* Megatron positional encoding alibi fix (#5808) (#5863) * 1. Debugging. * 1. Debugging. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * 1. Debugging. * 1. Debugging. * 1. Fixed initialization. Signed-off-by: Micha Livne <[email protected]> * 1. Debugging. * 1. Debugging. * 1. Debugging. * 1. Debugging. * 1. Debugging. * 1. Debugging. * 1. Debugging. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * 1. Debugging. * 1. Removed scale from ALiBi. Signed-off-by: Micha Livne <[email protected]> * 1. Updated yaml and added support to control number of alibi heads. Signed-off-by: Micha Livne <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * 1. Removed num_attention_heads_alibi from configs. Signed-off-by: Micha Livne <[email protected]> Signed-off-by: Micha Livne <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Micha Livne <[email protected]> Signed-off-by: Micha Livne <[email protected]> Co-authored-by: Micha Livne <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Micha Livne <[email protected]> Signed-off-by: Jason <[email protected]> * Fix segmenting for pcla inference (#5849) * Fix segmenting for pcla inference Signed-off-by: Matvei Novikov <[email protected]> * Fix segmenting for pcla inference Signed-off-by: Matvei Novikov <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Matvei Novikov <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Jason <[email protected]> * indentation fix (#5861) (#5862) Signed-off-by: nithinraok <[email protected]> Signed-off-by: nithinraok <[email protected]> Signed-off-by: nithinraok <[email protected]> Co-authored-by: Nithin Rao <[email protected]> Signed-off-by: Jason <[email protected]> * add ambernet to readme (#5872) (#5873) Signed-off-by: fayejf <[email protected]> Signed-off-by: fayejf <[email protected]> Signed-off-by: fayejf <[email protected]> Co-authored-by: fayejf <[email protected]> Signed-off-by: Jason <[email protected]> * Fix wrong label mapping in batch_inference for label_model (#5767) (#5870) * fix batch inference * add test for batch * fix device Signed-off-by: fayejf <[email protected]> Co-authored-by: fayejf <[email protected]> Signed-off-by: Jason <[email protected]> * WAR for https://github.com/pytorch/pytorch/pull/91526 Signed-off-by: Boris Fomitchev <[email protected]> Signed-off-by: Jason <[email protected]> * Fix memory allocation of NeMo Multi-speaker Data Simulator (#5864) * fix data simulator Signed-off-by: stevehuang52 <[email protected]> * update Signed-off-by: stevehuang52 <[email protected]> * update Signed-off-by: stevehuang52 <[email protected]> * Adding noise_manifest handling for faster speed Signed-off-by: Taejin Park <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Added multi-gpu feature Signed-off-by: Taejin Park <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Added a parameter for noise source file number Signed-off-by: Taejin Park <[email protected]> * Fixed noise_manifest error bug Signed-off-by: Taejin Park <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: stevehuang52 <[email protected]> Signed-off-by: Taejin Park <[email protected]> Co-authored-by: Taejin Park <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Jason <[email protected]> * RETRO model finetuning (#5800) * add save and load dynmaic index Signed-off-by: Yi Dong <[email protected]> * add chunk stride feature Signed-off-by: Yi Dong <[email protected]> * add chunk stride feature Signed-off-by: Yi Dong <[email protected]> * add no pq index Signed-off-by: Yi Dong <[email protected]> * added megatron lm compatible mode Signed-off-by: Yi Dong <[email protected]> * addd config Signed-off-by: Yi Dong <[email protected]> * fix position embedding Signed-off-by: Yi Dong <[email protected]> * added index factory Signed-off-by: Yi Dong <[email protected]> * share neighbors and weights amoung strategies Signed-off-by: Yi Dong <[email protected]> * fix bug Signed-off-by: Yi Dong <[email protected]> * added metric tto faiss index Signed-off-by: Yi Dong <[email protected]> * set default to inner product Signed-off-by: Yi Dong <[email protected]> * added qa fine tuen dataset Signed-off-by: Yi Dong <[email protected]> * added fine tuning code Signed-off-by: Yi Dong <[email protected]> * trim it Signed-off-by: Yi Dong <[email protected]> * fix data issue Signed-off-by: Yi Dong <[email protected]> * fix style Signed-off-by: Yi Dong <[email protected]> * added version Signed-off-by: Yi Dong <[email protected]> * fix key error Signed-off-by: Yi Dong <[email protected]> * make sure to overwrite the cfg Signed-off-by: Yi Dong <[email protected]> * make multiple sentence bert available Signed-off-by: Yi Dong <[email protected]> * fix the document Signed-off-by: Yi Dong <[email protected]> * fix the table Signed-off-by: Yi Dong <[email protected]> * fix transformer Signed-off-by: Yi Dong <[email protected]> * make sure to turn off the rope in chunked cross attention layer Signed-off-by: Yi Dong <[email protected]> * fix the security issue Signed-off-by: Yi Dong <[email protected]> * style fix Signed-off-by: Yi Dong <[email protected]> * fix codeql issues Signed-off-by: Yi Dong <[email protected]> * fix Signed-off-by: Yi Dong <[email protected]> * use -1 Signed-off-by: Yi Dong <[email protected]> * fix empty index Signed-off-by: Yi Dong <[email protected]> * clean up Signed-off-by: Yi Dong <[email protected]> * fix the lower bound for repetition penalty Signed-off-by: Yi Dong <[email protected]> * add retro qa inference strategy Signed-off-by: Yi Dong <[email protected]> * added new inference logic Signed-off-by: Yi Dong <[email protected]> * working inference Signed-off-by: Yi Dong <[email protected]> * fix TP inference Signed-off-by: Yi Dong <[email protected]> * revert requirement Signed-off-by: Yi Dong <[email protected]> * added file inference Signed-off-by: Yi Dong <[email protected]> * use string to prevent collison Signed-off-by: Yi Dong <[email protected]> * use NQ test Signed-off-by: Yi Dong <[email protected]> * fix prompt Signed-off-by: Yi Dong <[email protected]> * fix inference Signed-off-by: Yi Dong <[email protected]> * set good defaults for demo Signed-off-by: Yi Dong <[email protected]> * replicate adlr Signed-off-by: Yi Dong <[email protected]> * make sure to turn off attention reset for megatron lm compatible model Signed-off-by: Yi Dong <[email protected]> * style fix Signed-off-by: Yi Dong <[email protected]> * fix typo Signed-off-by: Yi Dong <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix inference error Signed-off-by: Yi Dong <[email protected]> * fix logging Signed-off-by: Yi Dong <[email protected]> * address comments Signed-off-by: Yi Dong <[email protected]> --------- Signed-off-by: Yi Dong <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Jason <[email protected]> * [TTS] GAN-based spectrogram enhancer (#5565) * [TTS] add SpectrogramEnhancer based on StyleGAN 2 Signed-off-by: Roman Korostik <[email protected]> * [TTS] some tests for spectrogram enhancer Signed-off-by: Roman Korostik <[email protected]> * [TTS] SpectrogramEnhancer: a tiny clean up Signed-off-by: Roman Korostik <[email protected]> * [TTS] SpectrogramEnhancer: log images during training Signed-off-by: Roman Korostik <[email protected]> * exp_manager: pass save_on_train_epoch_end to checkpointing callback Signed-off-by: Roman Korostik <[email protected]> * [TTS] SpectrogramEnhancer: add training script and config examples Signed-off-by: Roman Korostik <[email protected]> * [TTS] SpectrogramEnhancer: fix comments Signed-off-by: Roman Korostik <[email protected]> * [TTS] SpectrogramEnhancer: don't assume FastPitch Signed-off-by: Roman Korostik <[email protected]> * [TTS] SpectrogramEnhancer: better input shapes handling Signed-off-by: Roman Korostik <[email protected]> * [TTS] SpectrogramEnhancer: fix porting error Signed-off-by: Roman Korostik <[email protected]> * [TTS] SpectrogramEnhancer: fix logging and .nemo saving Signed-off-by: Roman Korostik <[email protected]> * [TTS] SpectrogramEnhancer: clean up scaling Signed-off-by: Roman Korostik <[email protected]> * [TTS] SpectrogramEnhancer: formatting Signed-off-by: Roman Korostik <[email protected]> * [TTS] SpectrogramEnhancer: update examples Signed-off-by: Roman Korostik <[email protected]> * [TTS] SpectrogramEnhancer: shape handling Signed-off-by: Roman Korostik <[email protected]> * [TTS] SpectrogramEnhancer: remove LoggerCollection handling Signed-off-by: Roman Korostik <[email protected]> * [TTS] SpectrogramEnhancer: copyright notice for tests Signed-off-by: Roman Korostik <[email protected]> * [TTS] SpectrogramEnhancer: use process_batch helper Signed-off-by: Roman Korostik <[email protected]> * [TTS] SpectrogramEnhancer: return empty list of available models Signed-off-by: Roman Korostik <[email protected]> * [TTS] SpectrogramEnhancer: some docs Signed-off-by: Roman Korostik <[email protected]> * [TTS] SpectrogramEnhancer: style --fix Signed-off-by: Roman Korostik <[email protected]> * [TTS] SpectrogramEnhancer: chan_last -> channel_last Signed-off-by: Roman Korostik <[email protected]> * [TTS] SpectrogramEnhancer: remove unused imports Signed-off-by: Roman Korostik <[email protected]> * [TTS] SpectrogramEnhancer: remove unused return value Signed-off-by: Roman Korostik <[email protected]> * [TTS] SpectrogramEnhancer: losses are nn.Modules now Signed-off-by: Roman Korostik <[email protected]> * [TTS] SpectrogramEnhancer: init optimizers from config Signed-off-by: Roman Korostik <[email protected]> * [TTS] SpectrogramEnhancer: formatting Signed-off-by: Roman Korostik <[email protected]> * [TTS] SpectrogramEnhancer: unused imports Signed-off-by: Roman Korostik <[email protected]> * [TTS] SpectrogramEnhancer: typechecking Signed-off-by: Roman Korostik <[email protected]> * [TTS] SpectrogramEnhancer: more tests Signed-off-by: Roman Korostik <[email protected]> * [TTS] SpectrogramEnhancer: fix logging images Signed-off-by: Roman Korostik <[email protected]> * [TTS] SpectrogramEnhancer: unclutter prepare_batch Signed-off-by: Roman Korostik <[email protected]> * [TTS] SpectrogramEnhancer: init generator and discriminator from the config for consistency with other NeMo models Signed-off-by: Roman Korostik <[email protected]> * [TTS] SpectrogramEnhancer: update spectrogram range in the example config Signed-off-by: Roman Korostik <[email protected]> * [TTS] SpectrogramEnhancer: comment on loss weights in the example config Signed-off-by: Roman Korostik <[email protected]> * [TTS] SpectrogramEnhancer: rename Conv2DMod to Conv2DModulated Signed-off-by: Roman Korostik <[email protected]> * [TTS] SpectrogramEnhancer: remove unused imports Signed-off-by: Roman Korostik <[email protected]> * [TTS] SpectrogramEnhancer: fix CodeQL import warnings Signed-off-by: Roman Korostik <[email protected]> * [TTS] SpectrogramEnhancer: type_as_recursive -> to_device_recursive Signed-off-by: Roman Korostik <[email protected]> * [TTS] SpectrogramEnhancer: move to_device_recursive to helpers Signed-off-by: Roman Korostik <[email protected]> * [TTS] SpectrogramEnhancer: move losses to a separate module, add comments Signed-off-by: Roman Korostik <[email protected]> * [TTS] SpectrogramEnhancer: add optimizers' entries to config Signed-off-by: Roman Korostik <[email protected]> * [TTS] SpectrogramEnhancer: fix test configs Signed-off-by: Roman Korostik <[email protected]> * [TTS] SpectrogramEnhancer: support length masking for 3-dim tensors Signed-off-by: Roman Korostik <[email protected]> * [TTS] SpectrogramEnhancer: add masking to spectrogram normalization Signed-off-by: Roman Korostik <[email protected]> * [TTS] SpectrogramEnhancer: fix tests Signed-off-by: Roman Korostik <[email protected]> * [TTS] SpectrogramEnhancer: add spectrogram normalization tests Signed-off-by: Roman Korostik <[email protected]> * [TTS] SpectrogramEnhancer: fix imports and formatting in tests Signed-off-by: Roman Korostik <[email protected]> * [TTS] SpectrogramEnhancer: fix docstring typo Signed-off-by: Roman Korostik <[email protected]> * [TTS] SpectrogramEnhancer: rename G and D to generator and discriminator Signed-off-by: Roman Korostik <[email protected]> * [TTS] SpectrogramEnhancer: better argument naming in interfaces (condition -> input_spectograms, target -> target_spectrograms) Signed-off-by: Roman Korostik <[email protected]> * [TTS] SpectrogramEnhancer: formatting Signed-off-by: Roman Korostik <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * [TTS] SpectrogramEnhancer: fix import warnings in modules Signed-off-by: Roman Korostik <[email protected]> * [TTS] add resynthesize_dataset.py script Signed-off-by: Roman Korostik <[email protected]> * [TTS] add PairedRealFakeSpectrogramsDataset Signed-off-by: Roman Korostik <[email protected]> * [TTS] SpectrogramEnhancer: update example config to reflect new data setup Signed-off-by: Roman Korostik <[email protected]> * [TTS] resynthesize_dataset.py: remove unused imports Signed-off-by: Roman Korostik <[email protected]> * [TTS] resynthesize_dataset.py: use nemo manifest handling Signed-off-by: Roman Korostik <[email protected]> * [TTS] resynthesize_dataset.py: remove unused import Signed-off-by: Roman Korostik <[email protected]> * [TTS] resynthesize_dataset.py: underscores for .npy names Signed-off-by: Roman Korostik <[email protected]> * [TTS] SpectrogramEnhancer: remove return value from a test Signed-off-by: Roman Korostik <[email protected]> * [TTS] add length masking helper Signed-off-by: Roman Korostik <[email protected]> * [TTS] SpectrogramEnhancer: use common tts length mask function Signed-off-by: Roman Korostik <[email protected]> * [TTS] unused imports in tts helpers Signed-off-by: Roman Korostik <[email protected]> * [TTS] SpectrogramEnhancer: fix an import Signed-off-by: Roman Korostik <[email protected]> * [TTS] SpectrogramEnhancer: introduce computed upsample_factor to generator Signed-off-by: Roman Korostik <[email protected]> * [TTS] SpectrogramEnhancer: clean up and clarify validation data setup Signed-off-by: Roman Korostik <[email protected]> * [TTS] SpectrogramEnhancer: remove a hardcoded path in the example config Signed-off-by: Roman Korostik <[email protected]> * [TTS] SpectrogramEnhancer: configurize max_spectrogram_length in generator Signed-off-by: Roman Korostik <[email protected]> * [TTS] resynthesize_dataset.py: consistent dashes and underscores in CLI args Signed-off-by: Roman Korostik <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Roman Korostik <[email protected]> Signed-off-by: Roman Korostik <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Jason <[email protected]> * Optimizing distributed Adam when running with one work queue (#5560) * Dist Adam constructs a single param bucket for each GPT layer Signed-off-by: Tim Moon <[email protected]> * Synchronize dist Adam reduce-scatters before launching model-parallel all-reduces Signed-off-by: Tim Moon <[email protected]> * Configure per-layer dist Adam buckets for BERT and T5 Signed-off-by: Tim Moon <[email protected]> * Remove unused variables Signed-off-by: Tim Moon <[email protected]> * Configure GPT with one dist Adam bucket per virtual pipeline stage Signed-off-by: Tim Moon <[email protected]> * Configure BERT with one dist Adam bucket per virtual pipeline stage Signed-off-by: Tim Moon <[email protected]> * Update Apex commit in Dockerfile Need recent updates to Apex distributed Adam optimizer. Signed-off-by: Tim Moon <[email protected]> * Remove logic for per-virtual-pipeline distopt buckets from T5 Signed-off-by: Tim Moon <[email protected]> --------- Signed-off-by: Tim Moon <[email protected]> Signed-off-by: Jason <[email protected]> * fix(readme): fix typo (#5883) Signed-off-by: Jean-Louis Queguiner <[email protected]> Signed-off-by: Jason <[email protected]> * TTS inference with Heteronym classification model, hc model inference refactoring (#5768) * refactor inference, fix span detection Signed-off-by: ekmb <[email protected]> * fix merge conflicts Signed-off-by: ekmb <[email protected]> * fix merge conflicts Signed-off-by: ekmb <[email protected]> * remove unused var Signed-off-by: ekmb <[email protected]> * clean up, test update Signed-off-by: ekmb <[email protected]> * arg name update Signed-off-by: ekmb <[email protected]> * merge wip Signed-off-by: ekmb <[email protected]> * revert changes Signed-off-by: ekmb <[email protected]> * update docs, move heteronym to baseg2p Signed-off-by: ekmb <[email protected]> * change wordid file defaults to none Signed-off-by: ekmb <[email protected]> * add manifest check Signed-off-by: ekmb <[email protected]> * replace homograph with heteronym, upper case wordid for riva, review feedback Signed-off-by: ekmb <[email protected]> * add log message, update comment Signed-off-by: ekmb <[email protected]> * rename test manifest field Signed-off-by: ekmb <[email protected]> --------- Signed-off-by: ekmb <[email protected]> Signed-off-by: Jason <[email protected]> * take out retro doc (#5885) (#5886) Signed-off-by: Yi Dong <[email protected]> Co-authored-by: Yi Dong <[email protected]> Signed-off-by: Jason <[email protected]> * Add option to disable distributed parameters in distributed Adam optimizer (#5685) * Add option to run dist Adam without distributed params Similar to DDP, but leverages dist Adam's support for overlapping communication with backward compute Signed-off-by: Tim Moon <[email protected]> * Fix bug in grad clipping when dist Adam has redundant params Signed-off-by: Tim Moon <[email protected]> --------- Signed-off-by: Tim Moon <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Jason <[email protected]> * [ASR] Separate Audio-to-Text (BPE, Char) dataset construction (#5774) * Separate full BPE dataset construction Signed-off-by: Vladimir Bataev <[email protected]> * Fix the case when the dataset is None Signed-off-by: Vladimir Bataev <[email protected]> * Fix comment Signed-off-by: Vladimir Bataev <[email protected]> * Fix typos Signed-off-by: Vladimir Bataev <[email protected]> * Separate char dataset construction. Fix DALI dataset usage. Signed-off-by: Vladimir Bataev <[email protected]> --------- Signed-off-by: Vladimir Bataev <[email protected]> Signed-off-by: Jason <[email protected]> * transformer duration added and IPA config files added Signed-off-by: Jason <[email protected]> * inference issue for pace resolved Signed-off-by: Jason <[email protected]> * Latest ONNX develpoments Signed-off-by: Boris Fomitchev <[email protected]> Signed-off-by: Jason <[email protected]> * Remove MCD_DTW tarball (#5889) Signed-off-by: Jocelyn Huang <[email protected]> Signed-off-by: Jason <[email protected]> * Block large files from being merged into NeMo main (#5898) * Attempt to use large-file pre-commit ci hook Signed-off-by: SeanNaren <[email protected]> * Set defaults and enforce Signed-off-by: SeanNaren <[email protected]> * Set to 1000 Signed-off-by: SeanNaren <[email protected]> * Remove enforcement Signed-off-by: SeanNaren <[email protected]> --------- Signed-off-by: SeanNaren <[email protected]> Signed-off-by: Jason <[email protected]> * Reduce memory usage in getMultiScaleCosAffinityMatrix function (#5876) * Updated offline_clustering.py, the getMultiScaleCosAffinityMatrix function, reduced memory usage Signed-off-by: gabitza-tech <[email protected]> * torch.empty.cache() outside forward_infer() Signed-off-by: Taejin Park <[email protected]> * Removed unnecessary lines Signed-off-by: Taejin Park <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Speed up for non torch.jit.script Signed-off-by: Taejin Park <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * parallelism is default off Signed-off-by: Taejin Park <[email protected]> * nme_mat_size is unified as 512, removing redundant docstring Signed-off-by: Taejin Park <[email protected]> --------- Signed-off-by: gabitza-tech <[email protected]> Signed-off-by: Taejin Park <[email protected]> Co-authored-by: Taejin Park <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Jason <[email protected]> * set max_steps for lr decay through config (#5780) * set max_steps for lr decay through config * added warning for optim sched max_steps config option * reverted changes to modelPT and updated megatron_base_model * added the experimental cosine annealing scheduler class * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update decay_steps for consine annealing exp class * added copyright --------- Co-authored-by: ANMOL GUPTA <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Jason <[email protected]> * Fix transducer and question answering tutorial bugs bugs (#5809) (#5810) Co-authored-by: Zhilin Wang <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Jason <[email protected]> * update apex install instructions (#5901) (#5902) Signed-off-by: ericharper <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Jason <[email protected]> * Hybrid ASR-TTS models (#5659) Add hybrid ASR-TTS models and text-to-text dataset Signed-off-by: Vladimir Bataev <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Jason <[email protected]> * Set providers for ORT inference session (#5903) Signed-off-by: athitten <[email protected]> Signed-off-by: Jason <[email protected]> * [ASR] Configurable metrics for audio-to-audio + removed experimental decorators (#5827) * Added an option to configure metrics for audio-to-audio models Removed experimental decorators Signed-off-by: Ante Jukić <[email protected]> * Addressed review comments Signed-off-by: Ante Jukić <[email protected]> --------- Signed-off-by: Ante Jukić <[email protected]> Signed-off-by: Jason <[email protected]> * Correct doc for RNNT transcribe() function (#5904) Signed-off-by: smajumdar <[email protected]> Signed-off-by: Jason <[email protected]> * Add segmentation export to Audacity label file (#5857) * Save the segmentation as label file for Audacity Audacity is a free open source audio editor that can import label file to quickly assess the segmentation quality. This commit add the export to [Audacity label format](https://manual.audacityteam.org/man/importing_and_exporting_labels.html) so that directly after running the segmentation tool the segmentation quality can be assessed or the segmentation can be shared easily. Signed-off-by: CaraDuf <[email protected]> * Fix styling Signed-off-by: CaraDuf <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove unused score in audacity export score is not written in audacity label file so we can safely not load it from segment. Signed-off-by: CaraDuf <[email protected]> --------- Signed-off-by: CaraDuf <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Jason <[email protected]> * Cross-Lingual objectives (XLM) and multilingual (many-many) support for Megatron-NMT (#5026) * Update blendable dataset, and refactor seq2seq data Signed-off-by: MaximumEntropy <[email protected]> * Blendable dataset with binarized mmap working Signed-off-by: MaximumEntropy <[email protected]> * Pass seed from cfg to dataset Signed-off-by: MaximumEntropy <[email protected]> * Fix multilingual setup Signed-off-by: MaximumEntropy <[email protected]> * Add on epoch start reconfiguration Signed-off-by: MaximumEntropy <[email protected]> * Style Signed-off-by: MaximumEntropy <[email protected]> * Update tokenizer creation for multilingual Signed-off-by: MaximumEntropy <[email protected]> * Tmp Signed-off-by: MaximumEntropy <[email protected]> * Update NMT script Signed-off-by: MaximumEntropy <[email protected]> * Remove unused import Signed-off-by: MaximumEntropy <[email protected]> * Update training script Signed-off-by: MaximumEntropy <[email protected]> * Log consumed samples Signed-off-by: MaximumEntropy <[email protected]> * Logging on val epoch end Signed-off-by: MaximumEntropy <[email protected]> * Style Signed-off-by: MaximumEntropy <[email protected]> * Remove redundant print Signed-off-by: MaximumEntropy <[email protected]> * Ckpt averaging for non model parallel megatron models Signed-off-by: MaximumEntropy <[email protected]> * Style Signed-off-by: MaximumEntropy <[email protected]> * Empty Signed-off-by: MaximumEntropy <[email protected]> * Update error message Signed-off-by: MaximumEntropy <[email protected]> * Style Signed-off-by: MaximumEntropy <[email protected]> * Remove check Signed-off-by: MaximumEntropy <[email protected]> * Restore fixes Signed-off-by: MaximumEntropy <[email protected]> * Remove ipdb Signed-off-by: MaximumEntropy <[email protected]> * Fixes Signed-off-by: MaximumEntropy <[email protected]> * Move to classmethods Signed-off-by: MaximumEntropy <[email protected]> * Initial Signed-off-by: MaximumEntropy <[email protected]> * 1. Debugging. Signed-off-by: Micha Livne <[email protected]> * Refactor masking to add skip_masking_id and working xlm bert and t5 datasets Signed-off-by: MaximumEntropy <[email protected]> * 1. Debugging. Signed-off-by: Micha Livne <[email protected]> * 1. Testing a simple solution Signed-off-by: Micha Livne <[email protected]> * 1. Fixed. Seems to work. Need to validate. Signed-off-by: Micha Livne <[email protected]> * 1. Added support in CSV and text memmap toMEgatron encoder-decoder Signed-off-by: Micha Livne <[email protected]> * 1. Added support in CSV. Signed-off-by: Micha Livne <[email protected]> * 1. Fixed style. Signed-off-by: Micha Livne <[email protected]> * 1. Fixed style. 2. Fixed bugs. Signed-off-by: Micha Livne <[email protected]> * 1. Debugging. Signed-off-by: Micha Livne <[email protected]> * 1. Fixed bugs. Signed-off-by: Micha Livne <[email protected]> * 1. Fixed style. Signed-off-by: Micha Livne <[email protected]> * 1. Updated yaml. Signed-off-by: Micha Livne <[email protected]> * Minor Signed-off-by: MaximumEntropy <[email protected]> * 1. Fixed warnings. Signed-off-by: Micha Livne <[email protected]> * 1. Fixed style. Signed-off-by: Micha Livne <[email protected]> * 1. Fixed style. Signed-off-by: Micha Livne <[email protected]> * 1. Fixed a bug. Signed-off-by: Micha Livne <[email protected]> * Tmp Signed-off-by: MaximumEntropy <[email protected]> * Updates Signed-off-by: MaximumEntropy <[email protected]> * Fix minor data things Signed-off-by: MaximumEntropy <[email protected]> * Fixes Signed-off-by: MaximumEntropy <[email protected]> * Lang ids for validation datasets Signed-off-by: MaximumEntropy <[email protected]> * More fixes for lang id code at inference Signed-off-by: MaximumEntropy <[email protected]> * Fix Signed-off-by: MaximumEntropy <[email protected]> * Fix Signed-off-by: MaximumEntropy <[email protected]> * Remove pdb Signed-off-by: MaximumEntropy <[email protected]> * Fix prepend ID and bleu logging Signed-off-by: MaximumEntropy <[email protected]> * Refactor Signed-off-by: MaximumEntropy <[email protected]> * Fixes for many-many NMT Signed-off-by: MaximumEntropy <[email protected]> * Fix Signed-off-by: MaximumEntropy <[email protected]> * Reset o2 default Signed-off-by: MaximumEntropy <[email protected]> * Style Signed-off-by: MaximumEntropy <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Restore dataset utils Signed-off-by: MaximumEntropy <[email protected]> * Fix Signed-off-by: MaximumEntropy <[email protected]> * Allreduce bleu scores Signed-off-by: MaximumEntropy <[email protected]> * Fix Signed-off-by: MaximumEntropy <[email protected]> * 1. Loading index file into memmap object. Signed-off-by: Micha Livne <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * 1. Fixed style. Signed-off-by: Micha Livne <[email protected]> * 1. Fixed extentin when loading files. Signed-off-by: Micha Livne <[email protected]> * Fix Signed-off-by: MaximumEntropy <[email protected]> * Fix redundant building Signed-off-by: MaximumEntropy <[email protected]> * PP > 2 for NMT Signed-off-by: MaximumEntropy <[email protected]> * Fixes Signed-off-by: MaximumEntropy <[email protected]> * Fixes Signed-off-by: MaximumEntropy <[email protected]> * Style Signed-off-by: MaximumEntropy <[email protected]> * Fix Signed-off-by: MaximumEntropy <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Merge and fix Signed-off-by: MaximumEntropy <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: MaximumEntropy <[email protected]> * Refactor multilingual again Signed-off-by: MaximumEntropy <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fixes Signed-off-by: MaximumEntropy <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Refactor and verify data formats Signed-off-by: MaximumEntropy <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * cleanup Signed-off-by: MaximumEntropy <[email protected]> * more fixes Signed-off-by: MaximumEntropy <[email protected]> * Fix passing langs Signed-off-by: MaximumEntropy <[email protected]> * Fix Signed-off-by: MaximumEntropy <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fixes Signed-off-by: MaximumEntropy <[email protected]> * Fixes Signed-off-by: MaximumEntropy <[email protected]> * More fixes Signed-off-by: MaximumEntropy <[email protected]> * Fixes for bart Signed-off-by: MaximumEntropy <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: MaximumEntropy <[email protected]> Signed-off-by: Micha Livne <[email protected]> Signed-off-by: Micha Livne <[email protected]> Co-authored-by: Micha Livne <[email protected]> Co-authored-by: Micha Livne <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Micha Livne <[email protected]> Signed-off-by: Jason <[email protected]> * ONNX export working Signed-off-by: Boris Fomitchev <[email protected]> Signed-off-by: Jason <[email protected]> * Fixing unit test Signed-off-by: Boris Fomitchev <[email protected]> Signed-off-by: Jason <[email protected]> * Update isort to the latest version (#5895) Update isort to the latest version Signed-off-by: Vladimir Bataev <[email protected]> --------- Signed-off-by: Vladimir Bataev <[email protected]> Signed-off-by: Jason <[email protected]> * Pin isort version (#5914) Signed-off-by: Vladimir Bataev <[email protected]> Signed-off-by: Jason <[email protected]> * Moved eval notebook data to aws (#5911) Signed-off-by: Jocelyn Huang <[email protected]> Signed-off-by: Jason <[email protected]> * FilterbankFeaturesTA to match FilterbankFeatures (#5913) Signed-off-by: Mohamed Saad Ibn Seddik <[email protected]> Signed-off-by: Jason <[email protected]> * fixed missing long_description_content_type (#5909) Signed-off-by: Xuesong Yang <[email protected]> Signed-off-by: Jason <[email protected]> * added TPMLP for T5-based models (#5840) (#5841) Signed-off-by: David Mosallanezhad <[email protected]> Co-authored-by: David <[email protected]> Co-authored-by: David Mosallanezhad <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Jason <[email protected]> * Fixing 0-size issue and ONNX BS>1 trace Signed-off-by: Boris Fomitchev <[email protected]> Signed-off-by: Jason <[email protected]> * Fixing code scan alert Signed-off-by: Boris Fomitchev <[email protected]> Signed-off-by: Jason <[email protected]> * update container (#5917) Signed-off-by: ericharper <[email protected]> Signed-off-by: Jason <[email protected]> * remove conda pynini install (#5921) Signed-off-by: ekmb <[email protected]> Signed-off-by: Jason <[email protected]> * Merge release main (#5916) * update branch Signed-off-by: ericharper <[email protected]> * added TPMLP for T5-based models (#5840) Signed-off-by: David Mosallanezhad <[email protected]> Signed-off-by: David Mosallanezhad <[email protected]> Co-authored-by: David Mosallanezhad <[email protected]> * remove notebook (#5859) Signed-off-by: ericharper <[email protected]> Signed-off-by: ericharper <[email protected]> * update branch Signed-off-by: ericharper <[email protected]> --------- Signed-off-by: ericharper <[email protected]> Signed-off-by: David Mosallanezhad <[email protected]> Co-authored-by: David <[email protected]> Co-authored-by: David Mosallanezhad <[email protected]> Signed-off-by: Jason <[email protected]> * Dynamic freezing in Nemo (#5879) * Initial commit for dynamic freezing logic Signed-off-by: Daniel Egert <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Updated logic to handle lists and updated docs Signed-off-by: Daniel Egert <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Transferred dynamic freezing logic to core from asr Signed-off-by: Daniel Egert <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Revert asr config to original Signed-off-by: Daniel Egert <[email protected]> * Fixed tab indent in core.rst Signed-off-by: Daniel Egert <[email protected]> * Updated modelPT for latest from master Signed-off-by: Daniel Egert <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fixed indents in docs Signed-off-by: Daniel Egert <[email protected]> --------- Signed-off-by: Daniel Egert <[email protected]> Co-authored-by: Daniel Egert <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Jason <[email protected]> * Fix Windows bug with save_restore_connector (#5919) * Initial commit for Windows bug with save_to Signed-off-by: Daniel Egert <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Daniel Egert <[email protected]> Co-authored-by: Daniel Egert <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Jason <[email protected]> * add new lannguages to doc (#5939) Signed-off-by: Yang Zhang <[email protected]> Signed-off-by: Jason <[email protected]> * Workarounds for ONNX export with autocast Signed-off-by: Boris Fomitchev <[email protected]> Signed-off-by: Jason <[email protected]> * fix val loss computation in megatron (#5871) * fix val loss computation in megatron * Fix NaN handling during validation --------- Co-authored-by: ANMOL GUPTA <[email protected]> Co-authored-by: Mikołaj Błaż <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Jason <[email protected]> * Restoring sigmas Signed-off-by: Boris Fomitchev <[email protected]> Signed-off-by: Jason <[email protected]> * Add core classes and functions for online clustering diarizer part 2 (#5609) * Add core classes and functions for online clustering diarizer Signed-off-by: Taejin Park <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add audio to labels code Signed-off-by: Taejin Park <[email protected]> * resolve type errors Signed-off-by: Taejin Park <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * added unit=tests for very short audio Signed-off-by: Taejin Park <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Filled all missing docstrings Signed-off-by: Taejin Park <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * resolved conflict and added missing docstrings Signed-off-by: Taejin Park <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fixed unit-test errors Signed-off-by: Taejin Park <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix the wrongly added file - megatron_gpt_model.py Signed-off-by: Taejin Park <[email protected]> * Fix wrongly included file - megatron_gpt_model.py Signed-off-by: Taejin Park <[email protected]> * resolve code quality issue Signed-off-by: Taejin Park <[email protected]> * Fixed unit-test errors and bugs Signed-off-by: Taejin Park <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * changed total_sec for offline_clustering toy_data in unit-tests Signed-off-by: Taejin Park <[email protected]> * fixed merging index offset bug Signed-off-by: Taejin Park <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * only including part 1 files Signed-off-by: Taejin Park <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * removed unused function Signed-off-by: Taejin Park <[email protected]> * fixed unused imports Signed-off-by: Taejin Park <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * divided nmesc_clustering.py into two and reflected first-pass comments Signed-off-by: Taejin Park <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * adding offline/online_clustering.py Signed-off-by: Taejin Park <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix code QL autocomment Signed-off-by: Taejin Park <[email protected]> * Removed unused imports Signed-off-by: Taejin Park <[email protected]> * Update nemo/collections/asr/parts/utils/online_clustering.py Co-authored-by: Sean Naren <[email protected]> Signed-off-by: Taejin Park <[email protected]> * Reflected comments Signed-off-by: Taejin Park <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * resolved code scanning issue Signed-off-by: Taejin Park <[email protected]> * Adding online_diarizer.py Signed-off-by: Taejin Park <[email protected]> * updated tests and speaker_utils Signed-off-by: Taejin Park <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fixed the wrong test eval Signed-off-by: Taejin Park <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * updating online diarizer for varialbe name change Signed-off-by: Taejin Park <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Reflected comments and some typo fixes in speaker_utils Signed-off-by: Taejin Park <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Taejin Park <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Nithin Rao <[email protected]> Co-authored-by: Sean Naren <[email protected]> Signed-off-by: Jason <[email protected]> * Distributed Adam optimizer overlaps param all-gather with forward compute (#5684) * Add distopt support for overlapping param all-gather with forward compute Signed-off-by: Tim Moon <[email protected]> * Update Apex commit Signed-off-by: Tim Moon <[email protected]> --------- Signed-off-by: Tim Moon <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Jason <[email protected]> * [TTS][ZH] added new NGC model cards with polyphone disambiguation. (#5940) * [TTS][ZH] added new NGC model cards with polyphone disambiguation. Signed-off-by: Xuesong Yang <[email protected]> Signed-off-by: Jason <[email protected]> * Moved truncation of context higher up Signed-off-by: Boris Fomitchev <[email protected]> Signed-off-by: Jason <[email protected]> * [TN] bugfix file handler is not closed. (#5955) Signed-off-by: Xuesong Yang <[email protected]> Signed-off-by: Jason <[email protected]> * Added unit test for regulate_len. Unscripted sort_tensor for TRT Signed-off-by: Boris Fomitchev <[email protected]> Signed-off-by: Jason <[email protected]> * Fixed slice Signed-off-by: Boris Fomitchev <[email protected]> Signed-off-by: Jason <[email protected]> * [TTS] deprecate AudioToCharWithPriorAndPitchDataset. (#5959) Signed-off-by: Xuesong Yang <[email protected]> Signed-off-by: Jason <[email protected]> * bugfix: file handlers are not closed. (#5956) Signed-off-by: Xuesong Yang <[email protected]> Signed-off-by: Jason <[email protected]> * [TTS][G2P] deprecate add_symbols (#5961) Signed-off-by: Xuesong Yang <[email protected]> Signed-off-by: Jason <[email protected]> * fix broken link (#5968) Signed-off-by: ericharper <[email protected]> Signed-off-by: Jason <[email protected]> * Fix hybridasr bug (#5950) (#5957) Signed-off-by: Jason <[email protected]> * Added list_available_models (#5967) * Added list_available_models Signed-off-by: Evgeniy Shabalin <[email protected]> * Added to readme Signed-off-by: Evgeniy Shabalin <[email protected]> * added vits to docs Signed-off-by: Evgeniy Shabalin <[email protected]> * added vits to docs Signed-off-by: Evgeniy Shabalin <[email protected]> --------- Signed-off-by: Evgeniy Shabalin <[email protected]> Signed-off-by: Evgeniy Shabalin <[email protected]> Signed-off-by: Jason <[email protected]> * Move settings to `pyproject.toml`. Remove deprecated `pytest-runner` (#5947) * Move project settings to pyproject.toml Signed-off-by: Vladimir Bataev <[email protected]> * Remove setup.cfg Signed-off-by: Vladimir Bataev <[email protected]> * Remove deprecated pytest-runner Signed-off-by: Vladimir Bataev <[email protected]> * Add comments Signed-off-by: Vladimir Bataev <[email protected]> * Allow only registered markers for pytest Signed-off-by: Vladimir Bataev <[email protected]> --------- Signed-off-by: Vladimir Bataev <[email protected]> Signed-off-by: Jason <[email protected]> * Fix torchaudio installation (#5850) * Fail if torchaudio not installed Signed-off-by: Vladimir Bataev <[email protected]> * Fix torchaudio matching version Signed-off-by: Vladimir Bataev <[email protected]> * Warn if Pytorch major version changed Signed-off-by: Vladimir Bataev <[email protected]> --------- Signed-off-by: Vladimir Bataev <[email protected]> Signed-off-by: Jason <[email protected]> * Update fastpitch.py (#5969) Signed-off-by: Jason <[email protected]> * Review comments Signed-off-by: Boris Fomitchev <[email protected]> Signed-off-by: Jason <[email protected]> * per-micro-batch input loader (#5635) * per-micro-batch input loader * per-micro-batch input loader set arg default val * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * minor fix * apply per-microbatch-loader to only GPT * update docstring on micro-batch input loader * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixed the default arg val * fix batch size to 1 at log stat registration * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update container for CI Signed-off-by: ericharper <[email protected]> * update container in jenkinsfile Signed-off-by: ericharper <[email protected]> * update container for CI Signed-off-by: ericharper <[email protected]> fix merge conflict * revert Jenkinsfile * Revert "revert Jenkinsfile" This reverts commit d23b7757e0f935dacde2840f234193c632a2b3be. * Update nemo/collections/nlp/models/language_modeling/megatron_gpt_model.py Signed-off-by: Tim Moon <[email protected]> * add GradScaler * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: ericharper <[email protected]> Signed-off-by: Tim Moon <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: ericharper <[email protected]> Co-authored-by: Tim Moon <[email protected]> Signed-off-by: Jason <[email protected]> * update container in readme (#5981) Signed-off-by: fayejf <[email protected]> Signed-off-by: Jason <[email protected]> * Support Alignment Extraction for all RNNT Beam decoding methods (#5925) * Partial impl of ALSD alignment extraction Signed-off-by: smajumdar <[email protected]> * Partial impl of ALSD alignment extraction Signed-off-by: smajumdar <[email protected]> * Remove everything else Signed-off-by: smajumdar <[email protected]> * Support dataclass in AbstractRNNTDecoding Signed-off-by: smajumdar <[email protected]> * Add first draft unittest Signed-off-by: smajumdar <[email protected]> * Correct the logic to more to the next timestep in the alignment Signed-off-by: smajumdar <[email protected]> * Finalize ALSD alignment generation Signed-off-by: smajumdar <[email protected]> * Add support for TSD greedy alignment extraction Signed-off-by: smajumdar <[email protected]> * Add support for mAES greedy alignment extraction Signed-off-by: smajumdar <[email protected]> * Finalize extraction of alignments from all beam algorithms for RNNT Signed-off-by: smajumdar <[email protected]> * Style fixes Signed-off-by: smajumdar <[email protected]> * Add copyright Signed-off-by: smajumdar <[email protected]> * Address comments Signed-off-by: smajumdar <[email protected]> --------- Signed-off-by: smajumdar <[email protected]> Signed-off-by: Jason <[email protected]> * Add AWS SageMaker ASR Examples (#5638) * Base code for AWS SageMaker example Signed-off-by: SeanNaren <[email protected]> * Remove format Signed-off-by: SeanNaren <[email protected]> * wrap Signed-off-by: SeanNaren <[email protected]> * Add a notebook with the code Signed-off-by: SeanNaren <[email protected]> * Setup Signed-off-by: SeanNaren <[email protected]> * Update notebook Signed-off-by: SeanNaren <[email protected]> * Remove space Signed-off-by: SeanNaren <[email protected]> * Fix spelling mistake Signed-off-by: SeanNaren <[email protected]> * Add message to explain usage Signed-off-by: SeanNaren <[email protected]> * Add CommonVoice esperanto example Signed-off-by: SeanNaren <[email protected]> * Fix path Signed-off-by: SeanNaren <[email protected]> * Fixes Signed-off-by: SeanNaren <[email protected]> * Import sox locally, add documentation Signed-off-by: SeanNaren <[email protected]> * Address reviews Signed-off-by: SeanNaren <[email protected]> * Address reviews Signed-off-by: SeanNaren <[email protected]> * Address reviews Signed-off-by: SeanNaren <[email protected]> * Add cell to download the SSL model Signed-off-by: SeanNaren <[email protected]> * Set max epochs to 300 Signed-off-by: SeanNaren <[email protected]> * Fixes, introduce HF dataset instructions Signed-off-by: SeanNaren <[email protected]> * Upstream updates from other branch Signed-off-by: SeanNaren <[email protected]> * Fix warning Signed-off-by: SeanNaren <[email protected]> * Add README, add image Signed-off-by: SeanNaren <[email protected]> * Fix warning Signed-off-by: SeanNaren <[email protected]> * Address feedback Signed-off-by: SeanNaren <[email protected]> * Feedback Signed-off-by: SeanNaren <[email protected]> --------- Signed-off-by: SeanNaren <[email protected]> Signed-off-by: Jason <[email protected]> * Update PUBLICATIONS.md (#5963) * Add papers from 2022/2022 to PUBLICATIONS.md Signed-off-by: smajumdar <[email protected]> * Remove ipynb from being tracked as for nemo code library Signed-off-by: smajumdar <[email protected]> * Remove ipynb from being tracked as for nemo code library Signed-off-by: smajumdar <[email protected]> * Add additional papers Signed-off-by: smajumdar <[email protected]> --------- Signed-off-by: smajumdar <[email protected]> Signed-off-by: Jason <[email protected]> * [G2P] fixed typos and broken import library. (#5978) (#5979) Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Jason <[email protected]> * [G2P] added backward compatibility for english tokenizer and fixed unit tests (#5980) (#5984) Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Jason <[email protected]> --------- Signed-off-by: Micha Livne <[email protected]> Signed-off-by: Jason <[email protected]> Signed-off-by: Matvei Novikov <[email protected]> Signed-off-by: nithinraok <[email protected]> Signed-off-by: fayejf <[email protected]> Signed-off-by: fayejf <[email protected]> Signed-off-by: Boris Fomitchev <[email protected]> Signed-off-by: stevehuang52 <[email protected]> Signed-off-by: Taejin Park <[email protected]> Signed-off-by: Yi Dong <[email protected]> Signed-off-by: Roman Korostik <[email protected]> Signed-off-by: Roman Korostik <[email protected]> Signed-off-by: Tim Moon <[email protected]> Signed-off-by: Jean-Louis Queguiner <[email protected]> Signed-off-by: ekmb <[email protected]> Signed-off-by: Vladimir Bataev <[email protected]> Signed-off-by: Jocelyn Huang <[email protected]> Signed-off-by: SeanNaren <[email protected]> Signed-off-by: gabitza-tech <[email protected]> Signed-off-by: ericharper <[email protected]> Signed-off-by: athitten <[email protected]> Signed-off-by: Ante Jukić <[email protected]> Signed-off-by: smajumdar <[email protected]> Signed-off-by: CaraDuf <[email protected]> Signed-off-by: MaximumEntropy <[email protected]> Signed-off-by: Micha Livne <[email protected]> Signed-off-by: Mohamed Saad Ibn Seddik <[email protected]> Signed-off-by: Xuesong Yang <[email protected]> Signed-off-by: David Mosallanezhad <[email protected]> Signed-off-by: Daniel Egert <[email protected]> Signed-off-by: Yang Zhang <[email protected]> Signed-off-by: Evgeniy Shabalin <[email protected]> Signed-off-by: Evgeniy Shabalin <[email protected]> Signed-off-by: Tim Moon <[email protected]> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Micha Livne <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Micha Livne <[email protected]> Co-authored-by: Matvei Novikov <[email protected]> Co-authored-by: Nithin Rao <[email protected]> Co-authored-by: fayejf <[email protected]> Co-authored-by: He Huang (Steve) <[email protected]> Co-authored-by: Taejin Park <[email protected]> Co-authored-by: Yi Dong <[email protected]> Co-authored-by: Roman Korostik <[email protected]> Co-authored-by: Tim Moon <[email protected]> Co-authored-by: Jean-Louis Queguiner <[email protected]> Co-authored-by: Evelina <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Co-authored-by: Vladimir Bataev <[email protected]> Co-authored-by: Mikyas Desta <[email protected]> Co-authored-by: Jocelyn <[email protected]> Co-authored-by: Sean Naren <[email protected]> Co-authored-by: Gabriel Pirlogeanu <[email protected]> Co-authored-by: anmolgupt <[email protected]> Co-authored-by: ANMOL GUPTA <[email protected]> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: Zhilin Wang <[email protected]> Co-authored-by: athitten <[email protected]> Co-authored-by: anteju <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: CaraDuf <[email protected]> Co-authored-by: Sandeep Subramanian <[email protected]> Co-authored-by: Micha Livne <[email protected]> Co-authored-by: Mohamed Saad Ibn Seddik <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: David <[email protected]> Co-authored-by: David Mosallanezhad <[email protected]> Co-authored-by: trias702 <[email protected]> Co-authored-by: Daniel Egert <[email protected]> Co-authored-by: Yang Zhang <[email protected]> Co-authored-by: Mikołaj Błaż <[email protected]> Co-authored-by: Evgeniy Shabalin <[email protected]> Co-authored-by: Jason <[email protected]> Co-authored-by: Sangkug Lym <[email protected]>

Signed-off-by: nithinraok <[email protected]> Co-authored-by: Nithin Rao <[email protected]>

Signed-off-by: Jocelyn Huang <[email protected]>

* fast conformer configs and doc * feedback * adding fast conformer to main README * path changes * rewording * further doc changes * naming --------- Signed-off-by: Dima Rekesh <[email protected]> Co-authored-by: Dima Rekesh <[email protected]> Co-authored-by: Dima Rekesh <[email protected]>

…ulator (#5897) * fix silence insertioon Signed-off-by: stevehuang52 <[email protected]> * update docs and tutorial Signed-off-by: stevehuang52 <[email protected]> * update Signed-off-by: stevehuang52 <[email protected]> * change to beta annd gamma distributions Signed-off-by: stevehuang52 <[email protected]> * update Signed-off-by: stevehuang52 <[email protected]> * fix typo Signed-off-by: stevehuang52 <[email protected]> * Added silence vs overlap selector with overlap algo Signed-off-by: Taejin Park <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Function name change and fixes Signed-off-by: Taejin Park <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update silence and overlap adding algorithm for better accuracy Signed-off-by: Taejin Park <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Recommended range for overlap mean Signed-off-by: Taejin Park <[email protected]> * Changing yaml file default values Signed-off-by: Taejin Park <[email protected]> * Fixed typos and errors in docstrings Signed-off-by: Taejin Park <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fixed minor bugs and removed unused functions Signed-off-by: Taejin Park <[email protected]> * Fixed minor bugs and removed unused imports Signed-off-by: Taejin Park <[email protected]> * Added docstrings for newly updated overlap algos Signed-off-by: Taejin Park <[email protected]> * Fixed non_silence_len_samples calculation, more accurate now Signed-off-by: Taejin Park <[email protected]> * adding missing docstring for non_silence_len Signed-off-by: Taejin Park <[email protected]> * removed ipdb lines Signed-off-by: Taejin Park <[email protected]> * refactor and update Signed-off-by: stevehuang52 <[email protected]> * updated logs for v1.1 Signed-off-by: Taejin Park <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Argument check update for mean=0 var=0 case Signed-off-by: Taejin Park <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix typo Signed-off-by: stevehuang52 <[email protected]> * update silence/overlap mean clipping Signed-off-by: stevehuang52 <[email protected]> * Adding mean clipping Signed-off-by: Taejin Park <[email protected]> * added 0 handling for ovl/sim_mean Signed-off-by: Taejin Park <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Tested on fisher and fixed the bug with string-speaker ID Signed-off-by: Taejin Park <[email protected]> * update code for visualization Signed-off-by: stevehuang52 <[email protected]> * refactor Signed-off-by: stevehuang52 <[email protected]> * fix load_rttm Signed-off-by: stevehuang52 <[email protected]> * Adding docstrings Signed-off-by: Taejin Park <[email protected]> * Adding usage in the analysis script Signed-off-by: Taejin Park <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix filename Signed-off-by: stevehuang52 <[email protected]> * Added argument check for sentence length params Signed-off-by: Taejin Park <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Removed unnecessary NB torch sampling Signed-off-by: Taejin Park <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add build_synthetic_vad_manifest.py Signed-off-by: stevehuang52 <[email protected]> * add check for non rttm files Signed-off-by: stevehuang52 <[email protected]> * added docstrings Signed-off-by: Taejin Park <[email protected]> * typo is fixed Signed-off-by: Taejin Park <[email protected]> * License template was missing, added Signed-off-by: Taejin Park <[email protected]> * add missing copyright and move script Signed-off-by: stevehuang52 <[email protected]> * add missing comma Signed-off-by: stevehuang52 <[email protected]> --------- Signed-off-by: stevehuang52 <[email protected]> Signed-off-by: Taejin Park <[email protected]> Co-authored-by: Taejin Park <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

Signed-off-by: Eduard Maghakyan <[email protected]>

Signed-off-by: Artem Zemliak <[email protected]> Co-authored-by: Artem Zemliak <[email protected]>

* retrieval service seperation Signed-off-by: Yi Dong <[email protected]> * refactor service code Signed-off-by: Yi Dong <[email protected]> * fix name Signed-off-by: Yi Dong <[email protected]> * add combo server Signed-off-by: Yi Dong <[email protected]> * added combo files Signed-off-by: Yi Dong <[email protected]> * fix the bug Signed-off-by: Yi Dong <[email protected]> * add retrieval service Signed-off-by: Yi Dong <[email protected]> * fix updatable flag Signed-off-by: Yi Dong <[email protected]> * working example Signed-off-by: Yi Dong <[email protected]> * seperate text generation server Signed-off-by: Yi Dong <[email protected]> * added webserver Signed-off-by: Yi Dong <[email protected]> * clean up and fix zero neighbor issue Signed-off-by: Yi Dong <[email protected]> * fix the style Signed-off-by: Yi Dong <[email protected]> * add license Signed-off-by: Yi Dong <[email protected]> * fixed code QL Signed-off-by: Yi Dong <[email protected]> * added bash script to launch the demo Signed-off-by: Yi Dong <[email protected]> * clean up Signed-off-by: Yi Dong <[email protected]> --------- Signed-off-by: Yi Dong <[email protected]>

* Use module-based k2 import guard Signed-off-by: Vladimir Bataev <[email protected]> --------- Signed-off-by: Vladimir Bataev <[email protected]>

* storing * Added VITS documentation Signed-off-by: Evgeniy Shabalin <[email protected]> * Added VITS documentation Signed-off-by: Evgeniy Shabalin <[email protected]> * Cleaned stuff Signed-off-by: Evgeniy Shabalin <[email protected]> * Cleaned stuff Signed-off-by: Evgeniy Shabalin <[email protected]> * cleaning Signed-off-by: Evgeniy Shabalin <[email protected]> * Typos Signed-off-by: Evgeniy Shabalin <[email protected]> * Added experimental note Signed-off-by: Evgeniy Shabalin <[email protected]> --------- Signed-off-by: Evgeniy Shabalin <[email protected]>

Signed-off-by: Abhinav Khattar <[email protected]> Co-authored-by: Abhinav Khattar <[email protected]>

Signed-off-by: Boris Fomitchev <[email protected]>

Signed-off-by: smajumdar <[email protected]>

Signed-off-by: ekmb <[email protected]> Co-authored-by: Evelina <[email protected]>

Signed-off-by: smajumdar <[email protected]> Co-authored-by: Eric Harper <[email protected]>

* Added documentation section for ASR datasets from AIStore Signed-off-by: Ante Jukić <[email protected]> * Address review comments Signed-off-by: Ante Jukić <[email protected]> --------- Signed-off-by: Ante Jukić <[email protected]>

commit b31f117 Author: Boris Fomitchev <[email protected]> Date: Tue Feb 14 15:12:30 2023 -0800 TJ hacks Signed-off-by: Boris Fomitchev <[email protected]> commit 7caae20 Author: Boris Fomitchev <[email protected]> Date: Tue Feb 14 10:06:04 2023 -0800 Ragged batching changes for RadTTS, some refactoring Signed-off-by: Boris Fomitchev <[email protected]> Signed-off-by: Boris Fomitchev <[email protected]> Co-authored-by: Jason <[email protected]>

* quick fix Signed-off-by: Jason <[email protected]> * undo Signed-off-by: Jason <[email protected]> --------- Signed-off-by: Jason <[email protected]>

Signed-off-by: Xuesong Yang <[email protected]>

* GPT no longer explicitly overlaps distopt communication with forward compute Signed-off-by: Tim Moon <[email protected]> * Remove unused import Signed-off-by: Tim Moon <[email protected]> --------- Signed-off-by: Tim Moon <[email protected]>

…cation (#6024) Signed-off-by: Tim Moon <[email protected]>

* remove TN Signed-off-by: ekmb <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix imports Signed-off-by: ekmb <[email protected]> * fix import Signed-off-by: ekmb <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add missing init Signed-off-by: ekmb <[email protected]> * fix import Signed-off-by: ekmb <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * rename unit test Signed-off-by: ekmb <[email protected]> * fix import Signed-off-by: ekmb <[email protected]> * fix modules test Signed-off-by: ekmb <[email protected]> * fix imports Signed-off-by: ekmb <[email protected]> * remove whitelist from config Signed-off-by: ekmb <[email protected]> * delete wordid file Signed-off-by: ekmb <[email protected]> * remove pynini_install from tutorials Signed-off-by: ekmb <[email protected]> * update requirements Signed-off-by: ekmb <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add support warning Signed-off-by: ekmb <[email protected]> * review Signed-off-by: ekmb <[email protected]> --------- Signed-off-by: ekmb <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* patch to allow using tokenizers without additional_special_tokens_ids attribute Signed-off-by: arendu <[email protected]> * early stop callback for prompt/p tuning Signed-off-by: arendu <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update Signed-off-by: arendu <[email protected]> * added exp manager config for early stop Signed-off-by: arendu <[email protected]> * pushed logic for creating early stopping inside exp manager Signed-off-by: arendu <[email protected]> * pushed logic for creating early stopping inside exp manager Signed-off-by: arendu <[email protected]> * minor updates and added dataclass check Signed-off-by: arendu <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * more args Signed-off-by: arendu <[email protected]> * more args Signed-off-by: arendu <[email protected]> --------- Signed-off-by: arendu <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

Signed-off-by: Boris Fomitchev <[email protected]>

Signed-off-by: Ryan <[email protected]>

* Tn doc 16 (#5954) * fix new repo links Signed-off-by: Yang Zhang <[email protected]>

Add model.eval() to ensure the accuracy. Signed-off-by: Slyne Deng <[email protected]> Co-authored-by: Slyne Deng <[email protected]>

* add random seed in perturb Signed-off-by: fayejf <[email protected]> * small update Signed-off-by: fayejf <[email protected]> * update evaluator config Signed-off-by: fayejf <[email protected]> * update tutorial Signed-off-by: fayejf <[email protected]> * update add_noise Signed-off-by: fayejf <[email protected]> --------- Signed-off-by: fayejf <[email protected]>

Signed-off-by: fayejf <[email protected]>

* Add Customization Dataset Preparation Tool Allows users to read data into prompt-and-completion format .jsonl as expected by the Customization service/NeMo LLM P tuning service Signed-off-by: Zhilin Wang [email protected] * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add license and usage examples, remove tutorial Signed-off-by: Zhilin Wang [email protected] * Fix typo Signed-off-by: Zhilin Wang [email protected] * Fix some more typos --------- Signed-off-by: Zhilin Wang [email protected] Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Oleksii Kuchaiev <[email protected]>

* Some simplifications Signed-off-by: Igor Gitman <[email protected]> * Add tests for stochastic depth Signed-off-by: Igor Gitman <[email protected]> * Fix tests for stochastic depth Signed-off-by: Igor Gitman <[email protected]> * Add interctc loss and logs Signed-off-by: Igor Gitman <[email protected]> * Fix a few issues Signed-off-by: Igor Gitman <[email protected]> * Add interctc loss tests Signed-off-by: Igor Gitman <[email protected]> * Add docs Signed-off-by: Igor Gitman <[email protected]> * Add training_step test for interctc Signed-off-by: Igor Gitman <[email protected]> * Refactoring with AccessMixin WIP Signed-off-by: Igor Gitman <[email protected]> * Separate interctc logic into a mixin Signed-off-by: Igor Gitman <[email protected]> * Fix tests Signed-off-by: Igor Gitman <[email protected]> * Fix some lint errors Signed-off-by: Igor Gitman <[email protected]> * Small refactoring Signed-off-by: Igor Gitman <[email protected]> * Add more docs, fix PR comments Signed-off-by: Igor Gitman <[email protected]> * Add other encoder support + more refactoring Signed-off-by: Igor Gitman <[email protected]> * Add more config examples Signed-off-by: Igor Gitman <[email protected]> * Move stochastic depth setup to utils Signed-off-by: Igor Gitman <[email protected]> * Add interctc_enabled setter + more docs Signed-off-by: Igor Gitman <[email protected]> * Fix a few doc strings for better web display Signed-off-by: Igor Gitman <[email protected]> * Update CTC flow diagram Signed-off-by: Igor Gitman <[email protected]> --------- Signed-off-by: Igor Gitman <[email protected]>

* Add pyctcdecode to high level beam search API Signed-off-by: smajumdar <[email protected]> * Remove redundant assignment Signed-off-by: smajumdar <[email protected]> --------- Signed-off-by: smajumdar <[email protected]>

* Initial Signed-off-by: MaximumEntropy <[email protected]> * Multiple fixes Signed-off-by: MaximumEntropy <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: MaximumEntropy <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add to CI test Signed-off-by: MaximumEntropy <[email protected]> * Fix Signed-off-by: MaximumEntropy <[email protected]> * check position embs for gpt prompt learning Signed-off-by: Adi Renduchintala <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update args Signed-off-by: MaximumEntropy <[email protected]> * Disable tts unit test Signed-off-by: MaximumEntropy <[email protected]> * Fix Signed-off-by: MaximumEntropy <[email protected]> * Fix Signed-off-by: MaximumEntropy <[email protected]> * Empty Signed-off-by: MaximumEntropy <[email protected]> * Update Jenkinsfile Changed optimizer for GPT training from 'fused_adam' to 'distributed_fused_adam'. Signed-off-by: khcs <[email protected]> * update config to to use correct key Signed-off-by: ericharper <[email protected]> * revert Jenkinsfile back to fused_adam Signed-off-by: ericharper <[email protected]> --------- Signed-off-by: MaximumEntropy <[email protected]> Signed-off-by: Adi Renduchintala <[email protected]> Signed-off-by: khcs <[email protected]> Signed-off-by: ericharper <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Adi Renduchintala <[email protected]> Co-authored-by: khcs <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Co-authored-by: ericharper <[email protected]>

* patch to allow using tokenizers without additional_special_tokens_ids attribute Signed-off-by: arendu <[email protected]> * early stop callback for prompt/p tuning Signed-off-by: arendu <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update Signed-off-by: arendu <[email protected]> * added exp manager config for early stop Signed-off-by: arendu <[email protected]> * pushed logic for creating early stopping inside exp manager Signed-off-by: arendu <[email protected]> * pushed logic for creating early stopping inside exp manager Signed-off-by: arendu <[email protected]> * minor updates and added dataclass check Signed-off-by: arendu <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * more args Signed-off-by: arendu <[email protected]> * more args Signed-off-by: arendu <[email protected]> * wrap tpmlp inside prompt encoder Signed-off-by: arendu <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * updates removed unused imports Signed-off-by: arendu <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * removes typecheck for tpmlp module Signed-off-by: arendu <[email protected]> --------- Signed-off-by: arendu <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

Signed-off-by: Boris Fomitchev <[email protected]>

* cache-aware streaming export Test onnx streaming conformer ctc WER Constant att cache width with len param Remove some extra functions in cache_aware runner transpose cache so that batch is first for trt Signed-off-by: Greg Clark <[email protected]> * fix export for full-context conformer * WIP trying to improve onnx perf Signed-off-by: Greg Clark <[email protected]> * Adding test scripts Signed-off-by: Greg Clark <[email protected]> * More perf testing script Signed-off-by: Greg Clark <[email protected]> * Updates for jit torch_tensorrt tracing Signed-off-by: Greg Clark <[email protected]> * Fixed trace warnings Signed-off-by: Boris Fomitchev <[email protected]> * Rearranging tests Signed-off-by: Boris Fomitchev <[email protected]> * Fixing non-caching case Signed-off-by: Boris Fomitchev <[email protected]> * testing Signed-off-by: Boris Fomitchev <[email protected]> * Fixed channel cache length issue Signed-off-by: Boris Fomitchev <[email protected]> * cache-aware streaming export Test onnx streaming conformer ctc WER Constant att cache width with len param Remove some extra functions in cache_aware runner transpose cache so that batch is first for trt Signed-off-by: Greg Clark <[email protected]> * fix export for full-context conformer * WIP trying to improve onnx perf Signed-off-by: Greg Clark <[email protected]> * Adding test scripts Signed-off-by: Greg Clark <[email protected]> * More perf testing script Signed-off-by: Greg Clark <[email protected]> * Updates for jit torch_tensorrt tracing Signed-off-by: Greg Clark <[email protected]> * stash Signed-off-by: Boris Fomitchev <[email protected]> * Reverting non-essential changes Signed-off-by: Boris Fomitchev <[email protected]> * Offset=None case Signed-off-by: Boris Fomitchev <[email protected]> * Remove test scripts Signed-off-by: Greg Clark <[email protected]> * Clean up speech_to_text_cache_aware_streaming_infer Signed-off-by: Greg Clark <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Revert pad -> constant_pad_nd Signed-off-by: Greg Clark <[email protected]> * conformer-encoder set window_size from streaming_cfg Signed-off-by: Greg Clark <[email protected]> * Fixes for working export(), using more constants Signed-off-by: Boris Fomitchev <[email protected]> * Optional rand init for cahce Signed-off-by: Greg Clark <[email protected]> * Folding update_cache with constants Signed-off-by: Boris Fomitchev <[email protected]> * More folding Signed-off-by: Boris Fomitchev <[email protected]> * Reducing diff #1 Signed-off-by: Boris Fomitchev <[email protected]> * Reducing diff #2 Signed-off-by: Boris Fomitchev <[email protected]> * Reducing diff #3 Signed-off-by: Boris Fomitchev <[email protected]> * Fixed unit tests, more reverts Signed-off-by: Boris Fomitchev <[email protected]> * Export fixes Signed-off-by: Boris Fomitchev <[email protected]> * Reverted slice changes that ruined ONNX perf Signed-off-by: Boris Fomitchev <[email protected]> * Adding back keep_all_outputs and drop_extra_preencoded Signed-off-by: Greg Clark <[email protected]> * Fix export Signed-off-by: Greg Clark <[email protected]> --------- Signed-off-by: Greg Clark <[email protected]> Signed-off-by: Boris Fomitchev <[email protected]> Co-authored-by: Boris Fomitchev <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Vahid Noroozi <[email protected]>

erhoo82 and others added 30 commits February 9, 2023 13:44

update container in readme (#5981)

2cc0942

Signed-off-by: fayejf <[email protected]>

[G2P] fixed typos and broken import library. (#5978) (#5979)

fe36f2b

Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]>

[G2P] added backward compatibility for english tokenizer and fixed un…

90ca7b1

…it tests (#5980) (#5984) Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]>

replace symbols (#5974) (#5990)

bfd371d

Signed-off-by: nithinraok <[email protected]> Co-authored-by: Nithin Rao <[email protected]>

Add some info about FastPitch SSL model (#5994)

349b095

Signed-off-by: Jocelyn Huang <[email protected]>

import missing dependency - gc (#6001)

d0ac3e0

Signed-off-by: Eduard Maghakyan <[email protected]>

Fix hybrid transcribe (#6003)

6c22356

Signed-off-by: Artem Zemliak <[email protected]> Co-authored-by: Artem Zemliak <[email protected]>

make validation accuracy reporting optional for adapters/ptuning (#5843)

99652c7

Use module-based k2 import guard (#6006)

c640325

* Use module-based k2 import guard Signed-off-by: Vladimir Bataev <[email protected]> --------- Signed-off-by: Vladimir Bataev <[email protected]>

Fix Prompt text space issue (#5983) (#5993)

c93c5a5

Signed-off-by: Abhinav Khattar <[email protected]> Co-authored-by: Abhinav Khattar <[email protected]>

Ragged batching changes for RadTTS, some refactoring (#6020)

cbdff07

Signed-off-by: Boris Fomitchev <[email protected]>

Default RNNT loss to int64 targets (#6011)

3e74995

Signed-off-by: smajumdar <[email protected]>

limit files for cut_audio.py (#6009) (#6018)

57bd9d5

Signed-off-by: ekmb <[email protected]> Co-authored-by: Evelina <[email protected]>

Fix reinstall.sh dependencies (#6027)

aac473a

Signed-off-by: smajumdar <[email protected]> Co-authored-by: Eric Harper <[email protected]>

Quick Fix for RadTTS test (#6034)

3a86da6

* quick fix Signed-off-by: Jason <[email protected]> * undo Signed-off-by: Jason <[email protected]> --------- Signed-off-by: Jason <[email protected]>

correct bash style according to SC2236. (#6025)

189d39b

Signed-off-by: Xuesong Yang <[email protected]>

Add BERT support for overlapping forward compute with distopt communi…

d4bea87

…cation (#6024) Signed-off-by: Tim Moon <[email protected]>

ekmb and others added 12 commits February 15, 2023 14:22

Disabling radtts tests untin we have real model (#6036)

ccfca84

Signed-off-by: Boris Fomitchev <[email protected]>

[TTS] Add Spanish IPA dictionaries and heteronyms (#6037)

e23d62b

Signed-off-by: Ryan <[email protected]>

Pr doc tn (#6041)

4f06f34

* Tn doc 16 (#5954) * fix new repo links Signed-off-by: Yang Zhang <[email protected]>

Update align.py (#6043) (#6045)

cdbb924

Add model.eval() to ensure the accuracy. Signed-off-by: Slyne Deng <[email protected]> Co-authored-by: Slyne Deng <[email protected]>

fix typo in asr evaluator readme (#6053)

cede377

Signed-off-by: fayejf <[email protected]>

Add pyctcdecode to high level beam search API (#6026)

8e6f36a

* Add pyctcdecode to high level beam search API Signed-off-by: smajumdar <[email protected]> * Remove redundant assignment Signed-off-by: smajumdar <[email protected]> --------- Signed-off-by: smajumdar <[email protected]>

github-actions bot added ASR NLP TTS core common CI labels Feb 18, 2023

borisfom closed this Feb 19, 2023

messiaen pushed a commit that referenced this pull request Mar 7, 2023

Reducing diff #1

b39407d

Signed-off-by: Boris Fomitchev <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Perf/export fixes #1

Perf/export fixes #1

borisfom commented Feb 18, 2023

Perf/export fixes #1

Perf/export fixes #1

Conversation

borisfom commented Feb 18, 2023

What does this PR do ?

Changelog

Usage

Before your PR is "Ready for review"

Who can review?

Additional Information